TSM - Assignment.
Collect the data for real and nominal yields from the Federal Reserve covering the period Jan 2005-August 2022.
import pandas as pd
from helpers import *
import plotly.graph_objects as go
import plotly.express as px
from statsmodels.tsa.stattools import adfuller
import chart_studio.plotly as py
import plotly.io as pio
import plotly.offline as pyo
import numpy as np
#
pio.renderers.default = "notebook"
# collect data from files
nominal_yields = pd.read_excel('nominal_yields.xlsx').dropna()
real_yields = pd.read_excel('real_yields.xlsx').dropna()
# show the first 5 rows of nominal yields
nominal_yields.head()
| Date | BETA0 | BETA1 | BETA2 | BETA3 | SVEN1F01 | SVEN1F04 | SVEN1F09 | SVENF01 | SVENF02 | ... | SVENY23 | SVENY24 | SVENY25 | SVENY26 | SVENY27 | SVENY28 | SVENY29 | SVENY30 | TAU1 | TAU2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2005-01-03 | 6.189826e-06 | 2.277828 | 2.659254 | 16.057365 | 3.4022 | 4.3567 | 5.7028 | 3.1680 | 3.5462 | ... | 5.0962 | 5.0874 | 5.0729 | 5.0535 | 5.0295 | 5.0014 | 4.9697 | 4.9348 | 1.436363 | 13.431858 |
| 1 | 2005-01-04 | 2.549831e-09 | 2.259019 | 3.046883 | 16.223499 | 3.5231 | 4.4071 | 5.7558 | 3.2872 | 3.6552 | ... | 5.1584 | 5.1495 | 5.1350 | 5.1155 | 5.0913 | 5.0631 | 5.0312 | 4.9960 | 1.393715 | 13.471621 |
| 2 | 2005-01-05 | 4.330379e-05 | 2.259806 | 3.089485 | 16.079062 | 3.5391 | 4.3983 | 5.7099 | 3.3009 | 3.6710 | ... | 5.1227 | 5.1131 | 5.0981 | 5.0780 | 5.0535 | 5.0249 | 4.9928 | 4.9574 | 1.412061 | 13.445450 |
| 3 | 2005-01-06 | 1.834049e-12 | 2.257302 | 2.977634 | 16.110743 | 3.5090 | 4.4031 | 5.7234 | 3.2678 | 3.6475 | ... | 5.1278 | 5.1183 | 5.1032 | 5.0831 | 5.0586 | 5.0299 | 4.9977 | 4.9623 | 1.423321 | 13.430211 |
| 4 | 2005-01-07 | 1.780984e-12 | 2.304075 | 2.956248 | 16.007722 | 3.5372 | 4.4163 | 5.6898 | 3.2920 | 3.6785 | ... | 5.1104 | 5.1006 | 5.0854 | 5.0651 | 5.0405 | 5.0118 | 4.9796 | 4.9442 | 1.476347 | 13.455895 |
5 rows × 100 columns
# show the first 5 rows of real yields
real_yields.head()
| Date | BETA0 | BETA1 | BETA2 | BETA3 | BKEVEN02 | BKEVEN03 | BKEVEN04 | BKEVEN05 | BKEVEN06 | ... | TIPSY11 | TIPSY12 | TIPSY13 | TIPSY14 | TIPSY15 | TIPSY16 | TIPSY17 | TIPSY18 | TIPSY19 | TIPSY20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2005-01-03 | -1.504322 | 1.565163 | -3167.402774 | 3176.711476 | 2.5935 | 2.5925 | 2.5779 | 2.5709 | 2.5767 | ... | 1.8002 | 1.8722 | 1.9313 | 1.9784 | 2.0144 | 2.0401 | 2.0565 | 2.0643 | 2.0644 | 2.0575 |
| 1 | 2005-01-04 | -0.973675 | 1.184436 | -3682.065233 | 3690.256864 | 2.5802 | 2.5909 | 2.5779 | 2.5701 | 2.5747 | ... | 1.8629 | 1.9333 | 1.9910 | 2.0369 | 2.0718 | 2.0967 | 2.1123 | 2.1197 | 2.1196 | 2.1127 |
| 2 | 2005-01-05 | -0.692399 | 0.976122 | -3186.398677 | 3193.908160 | 2.5502 | 2.5701 | 2.5611 | 2.5536 | 2.5569 | ... | 1.8764 | 1.9459 | 2.0028 | 2.0479 | 2.0821 | 2.1062 | 2.1213 | 2.1281 | 2.1276 | 2.1205 |
| 3 | 2005-01-06 | -0.344569 | 0.609703 | -1901.199337 | 1907.956756 | 2.5499 | 2.5726 | 2.5662 | 2.5605 | 2.5645 | ... | 1.8695 | 1.9399 | 1.9974 | 2.0428 | 2.0772 | 2.1015 | 2.1167 | 2.1237 | 2.1234 | 2.1168 |
| 4 | 2005-01-07 | -0.535426 | 0.854999 | -2554.392640 | 2561.606742 | 2.5209 | 2.5466 | 2.5423 | 2.5372 | 2.5405 | ... | 1.9012 | 1.9703 | 2.0267 | 2.0715 | 2.1053 | 2.1292 | 2.1441 | 2.1509 | 2.1504 | 2.1435 |
5 rows × 127 columns
# select specific maturities
nominal_yields_2_10y_eom = selectYieldsMaturities(nominal_yields,type='nominal',FD=False)
real_yields_2_10y_eom = selectYieldsMaturities(real_yields,type='real',FD=False)
# plot nominal data
plotYield(nominal_yields_2_10y_eom,columns=['SVENY02','SVENY03','SVENY05','SVENY07','SVENY10'],type='Nominal',FD=False)
# plot real yield data (TIPS)
plotYield(real_yields_2_10y_eom,columns=['TIPSY02','TIPSY03','TIPSY05','TIPSY07','TIPSY10'],type='Real',FD=False)
We have extracted and plotted the data, but before proceeding with PCA, it is usually well-practice to ensure stationarity of the data under consideration. From the figures of nominal and real yields respectively, some clear trends which for the statistical inference of data analysis should be removed (or limited as much as possible) for a correct interpretation of the results.
# check for stationarity in nominal yields
ADFtest(nominal_yields_2_10y_eom,type='nominal',maturity='2y')
ADF Statistic: -1.784198 p-value: 0.388335 Critical Values: 1%: -3.476 5%: -2.881 10%: -2.577 Non-Stationary timeserie
(-1.7841977737615378, 0.3883348001847712)
# let's try for 5y yields
ADFtest(nominal_yields_2_10y_eom,type='nominal',maturity='5y')
ADF Statistic: -1.843918 p-value: 0.358886 Critical Values: 1%: -3.475 5%: -2.881 10%: -2.577 Non-Stationary timeserie
(-1.8439183137699224, 0.3588855951201053)
# let's try for 10y yields
ADFtest(nominal_yields_2_10y_eom,type='nominal',maturity='10y')
ADF Statistic: -1.902721 p-value: 0.330782 Critical Values: 1%: -3.475 5%: -2.881 10%: -2.577 Non-Stationary timeserie
(-1.9027209249148123, 0.3307821864201821)
# nominal yields are non-stationary, so we re-take the original data, apply first difference and then take only the month end
nominal_yields_2_10y_eom_shifted = selectYieldsMaturities(nominal_yields,type='nominal',FD=True)
# check stationarity after first difference (only for the 2y)
ADFtest(nominal_yields_2_10y_eom_shifted,type='nominal',maturity='2y')
ADF Statistic: -7.123238 p-value: 0.000000 Critical Values: 1%: -3.477 5%: -2.882 10%: -2.578 Stationary timeserie
(-7.123237767881004, 3.6760004763770164e-10)
# let's plot the nominal yields stationary data
plotYield(nominal_yields_2_10y_eom_shifted,columns=['SVENY02','SVENY03','SVENY05','SVENY07','SVENY10'],type='Nominal',FD=True)
# check for stationarity in real yields
ADFtest(real_yields_2_10y_eom,type='real',maturity='2y')
ADF Statistic: -2.347080 p-value: 0.157233 Critical Values: 1%: -3.476 5%: -2.881 10%: -2.577 Non-Stationary timeserie
(-2.347080382879149, 0.15723267907692623)
# let's try with 5y
ADFtest(real_yields_2_10y_eom,type='real',maturity='5y')
ADF Statistic: -2.058364 p-value: 0.261583 Critical Values: 1%: -3.475 5%: -2.881 10%: -2.577 Non-Stationary timeserie
(-2.05836398030184, 0.2615826889940359)
# let's try with 10y
ADFtest(real_yields_2_10y_eom,type='real',maturity='10y')
ADF Statistic: -1.859705 p-value: 0.351248 Critical Values: 1%: -3.476 5%: -2.882 10%: -2.577 Non-Stationary timeserie
(-1.8597053177414344, 0.351247809531534)
# real yields are non-stationary, so we re-take the original data, apply first difference and then take only the month end
real_yields_2_10y_eom_shifted = selectYieldsMaturities(real_yields,type='real',FD=True)
# check stationarity after first difference (only for the 2y)
ADFtest(real_yields_2_10y_eom_shifted,type='real',maturity='2y')
ADF Statistic: -13.979960 p-value: 0.000000 Critical Values: 1%: -3.475 5%: -2.881 10%: -2.577 Stationary timeserie
(-13.97996046052464, 4.194864820791476e-26)
# let's plot the nominal yields stationary data
plotYield(real_yields_2_10y_eom_shifted,columns=['TIPSY02','TIPSY03','TIPSY05','TIPSY07','TIPSY10'],type='Real',FD=True)
Now we have done all the preliminary analysis entailing first differencing the data to ensure stationarity. At this point, we can start with the principal components analysis (PCA) using the stationary data.
# PCA